Practical Algorithms and Fixed-Parameter Tractability for the Single Individual SNP Haplotyping Problem
نویسندگان
چکیده
Single nucleotide polymorphisms (SNPs) are the most frequent form of human genetic variation, of foremost importance for a variety of applications including medical diagnostic, phylogenies and drug design. The complete SNPs sequence information from each of the two copies of a given chromosome in a diploid genome is called a haplotype. The Haplotyping Problem for a single individual is as follows: Given a set of fragments from one individual’s DNA, find a maximally consistent pair of SNPs haplotypes (one per chromosome copy) by removing data “errors” related to sequencing errors, repeats, and paralogous recruitment. Two versions of the problem, i.e. the Minimum Fragment Removal (MFR) and the Minimum SNP Removal (MSR), are considered. The Haplotyping Problem was introduced in [8], where it was proved that both MSR and MFR are polynomially solvable when each fragment covers a set of consecutive SNPs (i.e., it is a gapless fragment), and NPhard in general. The original algorithms of [8] are of theoretical interest, but by no means practical. In fact, one relies on finding the maximum stable set in a perfect graph, and the other is a reduction to a network flow problem. Furthermore, the reduction does not work when there are fragments completely included in others, and neither algorithm can be generalized to deal with a bounded total number of holes in the data. In this paper, we give the first practical algorithms for the Haplotyping Problem, based on Dynamic Programming. Our algorithms do not require the fragments to not include each other, and are polynomial for each constant k bounding the total number of holes in the data. For m SNPs and n fragments, we give an O(mn) algorithm for the MSR problem, and an O(2mn+2m) algorithm for the MFR problem, when each fragment has at most k holes. In particular, we obtain an O(mn) algorithm for MSR and an O(mn+m) algorithm for MFR on gapless fragments. Finally, we prove that both MFR and MSR are APX-hard in general. Research partially done while enjoying hospitality at BRICS, Department of Computer Science, University of Aarhus, Denmark. R. Guigó and D. Gusfield (Eds.): WABI 2002, LNCS 2452, pp. 29–43, 2002. c © Springer-Verlag Berlin Heidelberg 2002
منابع مشابه
O-36: Genome Haplotyping and Detection of Meiotic Homologous Recombination Sites in Single Cells, A Generic Method for Preimplantation Genetic Diagnosis
Background: Haplotyping is invaluable not only to identify genetic variants underlying a disease or trait, but also to study evolution and population history as well as meiotic and mitotic recombination processes. Current genome-wide haplotyping methods rely on genomic DNA that is extracted from a large number of cells. Thus far random allele drop out and preferential amplification artifacts of...
متن کاملPolynomial and APX-hard cases of the individual haplotyping problem
SNP haplotyping problems have been the subject of extensive research in the last few years, and are one of the hottest areas of Computational Biology today. In this paper we report on our work of the last two years, whose preliminary results were presented at the European Symposium on Algorithms (Proceedings of the Annual European Symposium on Algorithms (ESA), Vol. 2161. Lecture Notes in Compu...
متن کاملModels and Algorithms for Haplotyping Problem
One of the main topics in genomics is to determine the relevance of DNA variations with some genetic disease. Single nucleotide polymorphism (SNP) is the most frequent and important form of genetic variation which involves a single DNA base. The values of a set of SNPs on a particular chromosome copy define a haplotype. Because of its importance in the studies of complex disease association, ha...
متن کاملExtended Islands of Tractability for Parsimony Haplotyping
Parsimony haplotyping is the problem of finding a smallest size set of haplotypes that can explain a given set of genotypes. The problem is NP-hard, and many heuristic and approximation algorithms as well as polynomial-time solvable special cases have been discovered. We propose improved fixed-parameter tractability results with respect to the parameter “size of the target haplotype set” k by p...
متن کاملA fixed-parameter tractability result for multicommodity demand flow in trees
We study an NP-hard (and MaxSNP-hard) problem in trees—Multicommodity Demand Flow—dealing with demand flows between pairs of nodes and trying to maximize the value of the routed flows. This problem has been intensively studied for trees as well as for general graphs mainly from the viewpoint of polynomialtime approximation algorithms. By way of contrast, we provide an exact dynamic programming ...
متن کامل